3 research outputs found

    Automated question answering over the web: An adaptive search and retrieval strategy

    Get PDF
    The problem of efficiently finding answers to natural language questions over the web has gained much attention. Currently, useful experimental models for implementing question answering work well only for smaller, specific collections of documents and/or they only handle short, single factoid-type questions. Other more generally focused models retrieve and re-rank only a set of documents most likely to contain an answer. These approaches rely on only a few specific strategies to implement question answering. A more comprehensive and dynamic model of a question answering system may provide better performance for both retrieving candidate answer pools and extracting specific answers. Such a new model will be designed that efficiently combines automatic question reformulation, search strategy selection, query expansion, and answer extraction/pooling techniques. The system will automatically learn question reformulation for the most popular web search engines, based on training collections of question answer pairs, such as FAQs. Questions will be matched against automatically learned question types and reformulated into queries based on answer phrases likely to appear in a document containing the answer. The semantic answer type will also be determined based on the question type and used to recognize potential answers. During system training, the top ranked documents retrieved by the search engine will be examined for their likelihood of containing an appropriate answer. User answer-acceptance feedback will be collected to re-rank documents entries and/or refine new queries as necessary during live runs

    Semantic analysis for improved multi-document summarization of text

    Get PDF
    Excess amount of unstructured data is easily accessible in digital format. This information overload places too heavy a burden on society for its analysis and execution needs. Focused (i.e. topic, query, question, category, etc.) multi-document summarization is an information reduction solution which has reached a state-of-the-art that now demands the need to further explore other techniques to model human summarization activity. Such techniques have been mainly extractive and rely on distribution and complex machine learning on corpora in order to perform closely to human summaries. Overall, these techniques are still being used, and the field now needs to move toward more abstractive approaches to model human way of summarizing. A simple, inexpensive and domain-independent system architecture is created for adding semantic analysis to the summarization process. The proposed system is novel in its use of a new semantic analysis metric to better score sentences for selection into a summary. It also simplifies semantic processing of sentences to better capture more likely semantic-related information, reduce redundancy and reduce complexity. The system is evaluated against participants in the Document Understanding Conference and the later Text Analysis Conference using the performance ROUGE measures of n-gram recall between automated systems, human and baseline gold standard baseline summaries. The goal was to show that semantic analysis used for summarization can perform well, while remaining simple and inexpensive without significant loss of recall as compared to the foundational baseline system. Current results show improvement over the gold standard baseline when all factors of this work's semantic analysis technique are used in combination. These factors are the semantic cue words feature and semantic class weighting to determine sentences with important information. Also, the semantic triples clustering used to decompose natural language sentences to their most basic meaning and select the most important sentences added to this improvement. In competition against the gold standard baseline system on the standardized summarization evaluation metric ROUGE, this work outperforms the baseline system by more than ten position rankings. This work shows that semantic analysis and light-weight, open-domain techniques have potential.Ph.D., Information Studies -- Drexel University, 201

    Focused multi-document summarization: Human summarization activity vs. automated systems techniques

    Get PDF
    Focused Multi-Document Summarization (MDS) is concerned with summarizing documents in a collection with a concentration toward a particular external request (i.e. query, question, topic, etc.), or focus. Although the current state-of-the-art provides somewhat decent performance for DUC/TAC-like evaluations (i.e. government and news concerns), other considerations need to be explored. This paper not only briefly explores the state-of-the-art in automatic systems techniques, but also a comparison with human summarization activity
    corecore